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Abstract 

Background: Cardiolipin (CL) is an important component in mitochondrial inner and bacterial membranes. Its 
appearance in these two biomembranes has been considered as evidence of the endosymbiotic origin of 
mitochondria. But CL was reported to be synthesized through two distinct enzymes-CLS_cap and CLS_pld in 
eukaryotes and bacteria. Therefore, how the CL biosynthesis pathway evolved is an interesting question. 

Results: Phylogenetic distribution investigation of CL synthase (CLS) showed: most bacteria have CLS_pld pathway, 
but in partial bacteria including proteobacteria and actinobacteria CLS_cap pathway has already appeared; in 
eukaryotes, Supergroup Opisthokonta and Archaeplastida, and Subgroup Stramenopiles, which all contain 
multicellular organisms, possess CLS_cap pathway, while Supergroup Amoebozoa and Excavata and Subgroup 
Alveolata, which all consist exclusively of unicellular eukaryotes, bear CLS_pld pathway; amitochondriate protists in 
any supergroups have neither. Phylogenetic analysis indicated the CLS_cap in eukaryotes have the closest 
relationship with those of alpha proteobacteria, while the CLS_pld in eukaryotes share a common ancestor but 
have no close correlation with those of any particular bacteria. 

Conclusions: The first eukaryote common ancestor (FECA) inherited the CLS_pld from its bacterial ancestor (e. g. 
the bacterial partner according to any of the hypotheses about eukaryote evolution); later, when the FECA evolved 
into the last eukaryote common ancestor (LECA), the endosymbiotic mitochondria (alpha proteobacteria) brought 
in CLS_cap, and then in some LECA individuals the CLS_cap substituted the CLS_pld, and these LECAs would 
evolve into the protist lineages from which multicellular eukaryotes could arise, while in the other LECAs the 
CLS_pld was retained and the CLS_cap was lost, and these LECAs would evolve into the protist lineages 
possessing CLS_pld. Besides, our work indicated CL maturation pathway arose after the emergence of eukaryotes 
probably through mechanisms such as duplication of other genes, and gene duplication and loss occurred 
frequently at different lineage levels, increasing the pathway diversity probably to fit the complicated cellular 
process in various cells. Our work also implies the classification putting Stramenopiles and Alveolata together to 
form Chromalveolata may be unreasonable; the absence of CL synthesis and maturation pathways in 
amitochondriate protists is most probably due to secondary loss. 
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Accession number 

The nucleotide sequences of the Phaeodactylum tricor- 
nutum CLS_cap identified by us have been submitted to 
GenBank and their accession numbers are JN088191 
and JN088192. 

Background 

Cardiolipin (CL) is an important phospholipid compo- 
nent of mitochondrial inner membrane and bacterial 
membrane. In mitochondria, CL stabilizes the respiratory 
complexes and the supercomplexes mainly made up of 
complex III/IV [1,2], and maintains the generation of 
ATP [3,4]; it is also involved in mitochondrial protein 
import, cell wall biogenesis, translational regulation, 
aging and apoptosis [2]. In bacteria, CL interacts with 
energy metabolism proteins such as succinate dehydro- 
genase [5], formate dehydrogenase-N [6], and respiratory 
complex [7], and is assembled into reaction centers [8,9], 
and is also involved in proper localization of proteins on 
membrane [10,11]. Whereas, no CL have ever been 
found in archaea yet [12]. 

CL is biosynthesized from two molecules of phospha- 
tidylglycerols (PG) molecules in bacteria while from a 
PG and a Cytidine diphosphate diacylglycerol (CDP- 
DAG) in eukaryotes (Figure 1) [13]. In bacteria, the 



biosynthesis reaction is a reversible transesterification 
catalyzed by a kind of cardiolipin synthase (CLS) con- 
taining two phospholipase D (PLDc_2) domains- 
CLS_pld, while in eukaryotes, the reaction is not a 
reversible one catalyzed by another kind of CLS contain- 
ing one CDP-alcohol phosphatidyltransferase (CAP) 
domain-CLS_cap. In addition, only in eukaryotes the 
nascent CL is further remodeled to become mature CL, 
which generally contains the same fatty acids at sn-1, 2 
sites in a molecule of a certain organism [14-16]. The 
indispensable eukaryotic CL maturation process and 
enzymes are as follows: nascent CL is deacylated to 
form monolysocardiolipin (MLCL), which is catalyzed 
by either of the two kinds of enzymes-CL-specific phos- 
pholipase (CLD1, YGR110W) identified in yeast [17] 
and calcium-independent phospholipase A 2 (iPLA 2 ) beta 
or gamma reported in Drosophila and rat [18,19]; 
MLCL is then reacylated by CoA-independent tafazzin 
(TAZ) [20] or acylCoAdysocardiolipin acyltransferase 1 
(ALCAT1) [21] to become mature CL. Through this 
process, a high degree of acyl chain symmetry in CL is 
established. In bacteria, there is not such a maturation 
process at all. 

As seen above, the CL biosynthesis and maturation 
pathways in eukaryotes are distinct from those in bacteria. 




b. 

Figure 1 Diagrams of two types of CL synthesis pathways occurring in bacteria and eukaryotic mitochondria, respectively (a) CLS_pld 
pathway in a bacterium; (b) CLS_cap and maturation pathways in a mitochondrion of a eukaryotic cell. ER, endoplasmic reticulum; PM, 
plasmamembrane; OM, out membrane of mitochondrion; IMS, intermembrane space; IM, innermembrane of mitochondrion. Dashed arrow 
indicates the upstream pathways that not displayed here. 
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However, the simultaneous appearance of CL in both bac- 
teria and eukaryotic mitochondria has been considered to 
be a line of evidence for the endosymbiotic origin of mito- 
chondrion from bacteria [22,23]. According to the endo- 
symbiosis theory, many mitochondrial properties such as 
energy metabolism including respiratory chain are inher- 
ited from the bacterial endosymbiont. But the above differ- 
ences between mitochondria and bacteria make it 
uncertain whether this is true to CL biosynthesis pathway. 
Therefore, in fact how the eukaryotic CL biosynthesis and 
maturation pathways arise during the origin of eukaryotes 
from prokaryotes is still a mystery. 

Moreover, CL was reported to be absent in some anae- 
robic protists such as Giardia lamblia [24] and Trichomo- 
nas vaginalis [25]. These organisms possess no canonical 
mitochondria but mitosomes or hydrogenosomes, which 
do not have electron transport chain (ETC), membrane 
potential, and proton-driven ATP generation [26]. The 
lack of mitochondria in G. lamblia was once taken as the 
main evidence by many authors to support this organism 
is the most primitive eukaryote diverging from the eukar- 
yotic trunk before the emergence of mitochondria [27-29]. 
Therefore, whether the lack of CL in these 'amitochondri- 
ate' protists is due to their primitiveness or secondary 
degeneration is a question even relating to the early evolu- 
tion of eukaryotes. 

To study the origin and evolution of CL biosynthesis 
and maturation pathways, herein, phylogenetic distribu- 
tion and phylogeny of the CL biosynthesis and maturation 
enzymes were investigated in diverse eukaryotes of the five 
supergroups: Opisthokonta, Amoebozoa, Archaeplastida, 
Chromalveolata, and Excavata, and diverse bacteria, and 
some interesting observations were obtained. 

Results 

Phylogenetic distribution of CL biosynthesis enzymes in 
eukaryotes and their similar sequences in bacteria 
CL synthase (CLS) 

Homologs of CLS_cap were identified in Opisthokonta 
(except the amitochondriate Microsporidia), Archaeplas- 
tida, and Stramenopiles (except B. hominis, which does 
not have genome database) of Chromalveolata (Table 1). 
The two supergroups and one subgroup contain all the 
multicellular eukaryotes (Animalia, Fungi, Planta, Chloro- 
phyta, Rhodophyta, and Phaeophyceae) and some unicel- 
lular eukaryotes (protists). This means all the multicellular 
eukaryotes and only those unicellular eukaryotes that 
belong to the same supergroups (Opisthokonta and 
Archaeplastida) or subgroup (Stramenopiles of Chromal- 
veolata) with these multicellular eukaryotes possess 
CLS_cap. Generally, each species has only one homolog, 
but a few of them such as H. sapiens, M. musculus, 
C. elegans, D. melanogaster, S. purpuratus, and H. magni- 
papillata have more than one copy (Additional file 1: 



Table SI). Multiple sequence alignments revealed most of 
these identified homologs possess the conserved amino 
acid residues and membrane-binding regions of the 
CLS_cap [30] (Additional file 2: Figure SI). Many 
(> 3,000) bacterial similar sequences were found in diverse 
bacteria following eukaryotic homologs in the hit list when 
searching against RefSeq protein database when the cutoff 
E-value is 0.001, though most of them are annotated as 
CDP-diacylglycerol-glycerol-3-phosphate 3-phosphatidyl- 
transferase (PGPS). To reduce computation burden, only 
those top hits (1,500 sequences, E < le-18) were included 
in the below analyses. Among them there are two pre- 
viously reported CLS_cap from two actinobacteria [31], 
and according to our phylogenetic analysis, much more 
sequences from actinobacteria (88 of the 148 sequenced 
actinobacterial species) and some other bacteria including 
diverse proteobacteria and others are CLS_cap (data not 
shown). When these bacterial homologs were aligned to 
build HMM profile, and then the profile was used as 
query to search against all kinds of prokaryotic genomes, 
we also found only a small part of the surveyed bacteria 
(172 of the 1,375 bacteria), which are mainly proteobac- 
teria, actinobacteria, and a few other bacteria, possess 
CLS_cap (data not shown). 

Whereas, interestingly, in all the other two investigated 
eukaryotic supergroups and one subgroup, which all exclu- 
sively consist of unicellular eukaryotes (protists), including 
Amoebozoa (except the amitochondriate Entamoebida), 
Excavata (except the amitochondriate Parabasalia and 
Diplomonadida), and Alveolata in Chromalveolata, no 
CLS_cap but CLS_pld homologs were identified (Table 1). 
These homologs all contain the two conserved motifs 
which were proposed to be involved in phosphatidyl group 
transfer [32] (Additional file 3: Figure S2). Many (> 5,000, 
when E-value < 0.001) sequences annotated as CLS from 
diverse bacteria were also found to be top hits of CLS_pld. 
To investigate the distribution of CLS_pld in prokaryotes, a 
HMM profile built from seven genes whose CLS function 
were confirmed experimentally [33] was used as query to 
search bacterial genomes, CLS_pld homologs was found in 
most investigated bacteria (927 of the 1,375 bacteria). 
None type CLS is found in archaea. 

None of the eukaryotes investigated contains the both 
types of CLS. Whereas, in all the amitochondriate pro- 
tists mentioned above in brackets (e.g. Microsporidia, 
Entamoebida, Parabasalia, Diplomonadida), neither of the 
two types of CLS were found. No CLS were found in 
B. hominis yet, but this is probably due to its incomplete 
genome database. 
CL-specific phospholipase (CLD) 

Homologs were found in most genomes of four of the 
five eukaryotic supergroups except Amoebozoa, but 
within the four supergroups some subgroups or species 
such as Microsporidia, Ostreococcus, G. sulphuraria, 
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Table 1 The phylogenetic distribution of CL biosynthesis and maturation enzymes of five eukaryotic supergroups 
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Table 1 The phylogenetic distribution of CL biosynthesis and maturation enzymes of five eukaryotic supergroups 

(Continued) 
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Perkinsida, Apicomplexa, B. hominis, Heterolobosea, 
Parabasalia, and Diplomonadida do not have the homo- 
log yet (Table 1). Two typical motifs ("GXSXG" and 
"HX4D") of CLD [17], which are considered to function 
as lipase and acyltransferase, respectively, were found in 
almost all of these identified homologs (Additional file 
4: Figure S3). Many (> 5,000, when E-value < 0.001) 
bacterial similar sequences were also found following 
the eukaryotic homologs in the hit list, but most of 
them were annotated as "alpha/beta hydrolase" or 
"hypothetical protein". We only choose those very close 
to eukaryotic sequences in the hit list for the below phy- 
logenetic analyses. 

Calcium-independent phospholipase A2 (iPLA2) 

As the hits of iPLA 2 beta and gamma mixed together in 
the hit list due to high sequence similarity between the 
two enzymes, they were discriminated according to the 
below phylogenetic analyses. It was found homologs of 
iPLA 2 gamma exist in most genomes of four of the five 
supergroups (except Amoebozoa) and homologs of 
iPLA 2 beta were found in all animals and a fungus in 
Opisthokonta and two species in Amoebozoa. None 
homologs of the two iPLA 2 were found in many sub- 
groups and species, such as Choanoflagellate, most fungi 



(except A. fumigatus), Entamoebida, G. sulphuraria, 
Ciliata, Cryptosporidium, Oomycetes, B. hominis, Paraba- 
salia, and Diplomonadida. But many other fungi not 
listed in Table 1 were found to have iPLA 2 homologs 
when searching against RefSeqprotein database. Some 
organisms possess multiple homologs (Additional file 1: 
Table SI). Most of the identified homologs possess the 
two conserved segments which are the features of iPLA 2 
[34] (Additional file 5: Figure S4). Many bacterial similar 
sequences annotated to be "patatin" were found following 
these eukaryotic homologs in the hit list, and only those 
top hits (> 500 sequences when E-value < 0.001 for each 
query) were picked and supplied to the below phyloge- 
netic analyses. 

acyl-CoA:lysocardiolipin acyltransferase 1 (ALCAT) 

Besides annotated ALCAT, other eukaryotic enzyme 
homologs such as "l-acylglycerol-3-phosphate O- 
acyltransferase (AGPAT) 3, 4, 5" and "lysophosphatidyl- 
glycerol acyltransferase (LPGAT)" were also found in 
the genomes of all five supergroups when searching 
against the RefSeq protein database. Because of the 
high sequence similarities among them, their identities 
were further determined by the below phylogenetic ana- 
lyses. No homolog was found in several subgroups and 
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species including D. melanogaster, Microsporidia, 
Entamoebida, most Chlorophyta (except M. sp), Rhodo- 
phyta, Alveolata, T. pseudonana, Heterolobosea, Diplo- 
monadida, and Parabasalia (Table 1). Many (> 1,000, 
when E-value < 0.001) bacterial similar sequences were 
also found following the above eukaryotic homologs in 
the hit list. Their relationship with eukaryotic ALCAT 
homologs was determined by the below phylogenetic 
analyses. 
Tafazzin (TAZ) 

Homologs were found in all the five supergroups, but 
not found in several subgroups and species such as 
Microsporidia, Entamoebida, Alveolata, Kinetoplastids, 
Parabasalia, Diplomonadida, S. pombe, P. sojae, and 
P. tricornutum (Table 1). Bacterial sequences were also 
found after eukaryotic TAZ homologs in the hit list, and 
were mostly annotated as "acyltransferase". But they 
have very low sequence similarities with eukaryotic TAZ 
homologs, and our preliminary phylogenetic analysis 
does not support they have close relationship with 
eukaryotic TAZ, thus they were not included in the 
further analyses. 

Briefly, the distribution of the maturation pathway 
enzymes can be summarized as the following three condi- 
tions: 1) not any enzymes exist in Microsporida, Entamoe- 
bida, Cryptosporidium, Parabasalia, and Diplomonadida; 
2) there are only one or two enzymes in some protists, 
including G. sulphuraria, Alveolata (except Cryptospori- 
dium), and B. hominis, they are unable to form the com- 
plete two-step maturation pathway in these protists; 3) all 
the other eukaryotes possess most of the enzymes, which 
can form the complete two-step maturation pathway. 

Phylogeny of CL biosynthesis enzymes 

As the Maximum Likelihood (ML) and Bayesian trees 
showed similar topologies, here we chose the Bayesian 
tree as a representative with the bootstrap values of ML 
tree also on the tree (As for the following other enzymes, 
the similar results were obtained, and so Bayesian trees 
were also chosen as representatives). 

On the CLS_cap phylogenetic tree (Figure 2, for the 
ML tree please see Additional file 6: Figure S5), all the 
identified homologs from eukaryotes are recovered into a 
highly supported big monophyletic clade (Clade E). 
Within this clade, homologs from Opisthokonta, Archae- 
plastida, and Stramenopiles of Chromalveolata form 
three subclades with high support values, and within 
these subclades many groups corresponding to their 
source lineages were also recovered. Furthermore, multi- 
ple homologs from a species always cluster together 
firstly, suggesting they are the products of species-specific 
gene duplication. A clade consisting of all homologs from 
alpha proteobacteria was recovered to be the closest sis- 
tergroup of the Clade E with a moderate support value 



(0.73/54) with all the homologs of other diverse bacteria 
being its outgroups. Among these outgroups, the actino- 
bacterial clade, which contains the two previously 
reported CLS_cap identified from two actinobacteria 
[31], is the outmost group, suggesting all the homologs of 
these outgroups are CLS_cap. Finally, PGPS from diverse 
bacteria form an outgroup of all the above clades. There- 
fore, our results suggest besides in actinobacteria as 
reported previously, CLS_cap might have already 
emerged in some other bacteria including diverse proteo- 
bacteria and others, and eukaryotes might acquire their 
CLS_cap from alpha proteobacteria. 

On the CLS_pld phylogenetic tree (Figure 3), all the 
identified homologs from eukaryotes are also recovered 
into a highly supported big monophyletic clade (Clade E). 
Within this clade, homologs form three subclades almost 
corresponding to their three source supergroups-Alveo- 
lata of Chromalveolata, Amoebozoa, and Excavata, and 
within these subclades homologs also form groups corre- 
sponding to their source lineages (e.g. Apicomplexa, 
Perkinsida, and Ciliata). However, Clade E does not show 
any particular close correlations with those similar 
sequences from any current bacterial lineages. These 
results suggest that all the CLS_pld from the eukaryotes 
(which are exclusively unicellular organisms, protists) of 
the three eukaryotic supergroups have a common ances- 
tor, which does not fall into any of the present bacterial 
lineages. 

Phylogeny of CL maturation enzymes 

Due to their very low sequence similarities with TAZ and 
ALCAT, bacterial similar sequences of these two 
enzymes were not included in the final phylogenetic ana- 
lysis. The obtained four phylogenetic trees (Additional 
file 7: Figure S6, Additional file 8: Figure S7, Additional 
file 9: Figure S8, Additional file 10: Figure S9 and Addi- 
tional file 11: Table S2) showed: 1) all the eukaryotic 
homologs of each enzyme cluster together firstly with 
high support values, none of these enzymes show a close 
relationship with any particular bacterial similar 
sequences, and the low support values also do not sup- 
port they have direct phylogenetic correlations with any 
bacterial sequences, suggesting they are not inherited 
from bacterial ancestors directly but arose after the emer- 
gence of eukaryotes, and each of these enzymes in all 
eukaryotes has a common ancestor which have might 
already emerged in the last eukaryotic common ancestor 
(LECA) of the five supergroups; 2) homologs of each 
enzymes from a common supergroup or lineage (e.g. 
Animalia, Fungi, Oomycetes and Planta) do not form a 
common clade corresponding their source supergroup or 
lineage but usually form two or more separated clades, 
and alternative trees constraining them as monophyly 
were rejected significantly (Additional file 11: Table S2), 
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on aligned amino acid dataset. Numbers at the nodes correspond to Bayesian posterior probabilites > 0.50 {at the left of slashes) and the 
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suggesting gene duplication and loss occurred frequently 
on these enzymes at different lineage levels. Moreover, as 
for ALCAT, firstly, all the homologs form a sistergroup 
to AGPAT 3/4 clade, suggesting ALCAT arose through 



gene duplication and divergence with the enzyme 
AGPAT 3/4. This means gene duplication and diver- 
gence also have ever occurred between ALCAT and 
AGPAT 3/4 during the origin of ALCAT. What is more, 
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multiple copies of homologs of each of these enzymes 
from a species generally clustered together, suggesting 
gene duplication of these enzymes continues occurring 
relatively recently in some species. 

Discussion 

The origin and evolution of CL biosynthesis pathways in 
eukaryotes 

As mentioned above, CL is biosynthesized by two dis- 
tinct synthases-CLS_cap and CLS_pld. The two types of 
enzymes belong to two distinct protein families without 
any primary sequence similarity between them [16]. 
Generally, it is considered eukaryotes have CLS_cap and 
bacteria CLS_pld. However, our investigation revealed 
although most bacteria possess CLS_pld, some kinds of 
bacteria including actinobacteria, proteobacteria, and 
some others, bear CLS_cap, suggesting CLS_cap has 
already arisen in some bacteria actually; in eukaryotes, 
all the multicellular organisms and only those unicellu- 
lar organisms (protists) which belong to the same super- 
groups or subgroup with these multicellular organisms 
possess CLS_cap. Our phylogenetic analysis further 
showed all the CLS_cap in these eukaryotes have the 
closest relationship with those of alpha proteobacteria. 
Since alpha proteobacteria is generally considered to be 
the endosymbiotic ancestor of mitochondrion [35-37], 
then the CLS_cap pathway in these eukaryotes most 
probably originated from alpha proteobacteria through 
the mitochondrial endosymbiotic event. This is inconsis- 
tent with the previous postulation that eukaryotic CLS 
originated from the prokaryotic type PGPS which 
existed in ancestral eukaryotes [38]. 

On the other hand, our investigation revealed all the 
other eukaryotes whose supergroups or subgroup con- 
sist exclusively of unicellular eukaryotes (protists) pos- 
sess CLS_pld. Among these eukaryotes a few lineages 
such as Trypanosoma, Leishmania, Theileria, Plasmo- 
dium, Cryptosporidium and Dictyostelium had pre- 
viously been reported to have CLS_pld by other authors, 
and this condition was explained as an evolutionary sur- 
vival of the prokaryotic reaction for CL formation into 
the eukaryotic kingdom [38]. Actually, CL was reported 
to really exist in these eukaryotes such as D. discoideum, 
T. thermophila, P. tetraurelia, P. marinus and T. cruzi 
[39-43]. But, according to our present work, since 1) 
CLS_pld is widely distributed in so many kinds of pro- 
tists (only with the exception of those protists in Super- 
group Opisthokonta, Archaeplastida, and Stramenopiles 
of Supergroup Chromalveolata), and forms a comple- 
mentary distribution with the CLS_cap within the entire 
eukaryote Domain (mainly within protists); 2) on the 
phylegenetic tree, all the CLS_pld from different eukar- 
yotes (protists) were clustered together as a common 
clade, without showing close relationship with the 



CLS_pld from any particular extant bacterial lineages, 
suggesting they have a common ancestor which is prob- 
ably very ancient and is not kept in any extant bacterial 
lineages without obvious changes, then these CLS_pld in 
eukaryotes can not be a secondary acquisition by inde- 
pendent horizontal gene transfer (HGT) from different 
bacteria in different protist lineages, but must have be 
inherited from a common ancestor of these eukaryotes. 
Because 1) such a common ancestor can only be the 
last eukaryotic common ancestor (LECA) or the first 
eukaryotic common ancestor (FECA); 2) most bacteria 
(except most proteobacteria and actinobacteria, which 
bear CLS_cap pathway) possess CLS_pld pathway, and 
the emergence of CLS_cap in partial bacteria might 
occur much later than CLS_pld; 3) the common ances- 
tor of these eukaryotic CLS_pld can not be found in 
extant bacteria as that of eukaryotic CLS_cap, so the 
acquisition of these eukaryotic CLS_pld might occurred 
very anciently (probably earlier than the endosymbiotic 
origin of mitochondria from alpha proteobacteria). 
Therefore, it is most probably that the FECA inherited 
the CLS_pld pathway from a ancient bacterium such as 
the bacterial partner according to the "fusion hypoth- 
esis" [44], or the proto-eukaryote derived from bacteria 
according to the 'phagotrophy hypothesis' [45], or the 
bacteirium related to the origin of the nucleus according 
to the 'endosymbiosis hypothesis' [46-48]. 

Neither CLS_cap nor CLS_pld was found in all the 
investigated amitochondriate protists, inspite of which 
eukaryotic supergroup (Opisthokonta, Amoebozoa, or 
Excavata) these protists belong to. This is consistent 
with the lack of CL in these organisms such as G. lam- 
blia, T. vaginalis, and E. cuniculi [24,25,49]. Since both 
bacteria and all the other eukaryotes have CL and the 
corresponding CL biosynthesis pathways, the absence of 
either of the two CL biosynthesis pathways in these ami- 
tochondriate protists must be the results of secondary 
loss due to their degeneration of mitochondria. Consis- 
tently, it was showed anaerobic prokaryotes lack CL, 
and anaerobic condition can cause the decrease of CL in 
contrast to aerobic in yeast [50,51]. The existence of CL 
in a relative of T. vaginalis- Tritrichomonas foetus [23] 
further support such a secondary loss once occurred at 
least in T. vaginalis. The lack of either type of CLS in B. 
hominis might also due to its lack of mitochondria or 
incomplete genome database. 

Considering the distinctive difference of phospholipid 
between archaea and bacteria and eukaryotes [52], and 
the absence of either type of CLS in archaea, it is rea- 
sonable to postulate archaea may not contribute to the 
origin of eukaryotic CL biosynthesis. Therefore, based 
on the above analyses, we can propose a evolutionary 
scenario about the CL biosynthesis pathway in eukar- 
yotes as follow (Figure 4): in the process of the origin 
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Figure 4 An evolutionary route chart of CL biosynthesis and maturation pathways in eukaryotes Short black bar, CLS_cap gene; Short 
red bar, CLS_pld gene; Grey dot, CL maturation pathway genes; Hollow dot, absence of CL maturation pathway. EGT, endosymbiotic gene 
transfer from proto-mitochondrion to nucleus. 



and evolution of eukaryotes, the FECA inherited the 
CLS_pld pathway from its bacterial ancestor, which is 
probably the bacterial partner according to any of the 
hypotheses about eukaryote evolution such as the 'fusion 



hypothesis', the 'phagotrophy' hypothesis and the 'endo- 
symbiosis hypothesis'; later, when the FECA evolved 
into LECA, the endosymbiotic origin of mitochondrion 
brought in another CL synthase-CLS_cap, which had 
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arisen in the endosymbiotic bacteria-alpha proteobac- 
teria; then, in those LECA individuals which would 
evolve into those unicellular eukaryote lineages (e. g. 
Chonanoflagellates, Chlorophyta) from which multicellu- 
lar eukaryotes (e. g. Animalia and Fungi in Opisthokonta, 
Archaeplastida, and Phaeophyceae in Chromalveolata) 
could arise, the endosymbiotic-original CLS_cap gene 
was transferred into the nuclear genome of the host cell, 
and the previous CLS_pld pathway was substituted, while 
in the other LECA individuals which would just evolve 
into the other unicellular protist lineages (e. g. Amoebo- 
zoa, Alveolata of Chromalveolata, and Excavata) from 
which no multicellular eukaryotes would arise, the 
previous CLS_pld was retained and the endosymbiotic- 
original CLS_cap was lost; in the amitochondriate 
protists (including Microsporidia) the CL biosynthesis 
pathway (either CLS_pld or CLS_cap) was secondly 
totally lost due to their secondary degeneration of 
mitochondria. 

The origin and evolution of CL maturation pathway in 
eukaryotes 

The eukaryotic CL maturation pathway consists of two 
steps, and altogether five enzymes have been previously 
identified to participate in this process in different 
eukaryotes. 

CL maturation is indispensable in higher eukaryotes 
though the purpose of this process is not very clear. Our 
phylogenetic analyses indicated all maturation enzymes 
arise after the emergence of eukaryotes, and might have 
already emerged prior to the divergence of all the eukar- 
yote supergroups. Except ALCAT seems to arise through 
gene duplication and divergence of other existing enzyme 
(AGPAT 3/4), the origins of the other enzymes are not 
clear yet. 

Our phylogenetic analyses also indicated gene duplica- 
tion and gene loss occurring frequently at different lineage 
levels in the evolution of maturation pathways. These gene 
duplications and losses result in a patchy distribution of 
the maturation pathway enzymes in diverse eukaryotes, 
increasing the diversity of the pathway. Different enzymes 
or multiple homologs in the same step of the pathway can 
widen the recognition of substrates carrying different fatty 
acid substituents, and iPLA 2 beta and gamma are just such 
a reported example for this [53]. Such a condition for the 
maturation pathway might be the results of adaptive evo- 
lution for coping with the complicated cellular process in 
various eukaryotic cells. 

In the present work, we found except some unicellular 
eukaryotes including all the amitochondriate protists 
(Microsporidia, Entamoebida, Parabasalia and Diplomo- 
nadida), all Alveolata in Chromalveolata, and a few other 
species (e. g. G. sulphuraria, B. hominis), all the other 
eukaryotes, which distribute in all the five eukaryotic 



supergroups, either unicellular or multicellular, either 
parasitic or free-living, possess a complete CL maturation 
pathway by having at least one enzyme for each step of 
the pathway. The absence of the complete pathway in B. 
hominis and G. sulphuraria might be caused by their 
incomplete databases, and the lack in other protists are 
probably due to various secondary losses, because 1) each 
enzyme of pathway from various lineages form a mono- 
phyly on the phylogenetic trees, 2) their close relatives 
have this pathway, and 3) some, though not a complete 
set, of the enzymes of the pathway, appear in some of 
these protists. The totally absence of this pathway in ami- 
tochondriate protists (without any enzymes of this path- 
way) is consistent with the lack of typical mitochondria 
and CL in these protists, and must be due to the degen- 
eration of mitochondria in them. Whereas, the presence 
of partial of the enzymes of this pathway in some protists 
(e.g. Ciliata, Perkinsida, most Apicomplexa) might sug- 
gest their maturation pathways are in the process of los- 
ing or the enzymes left might have other functions. 

Implications to the evolution of eukaryotes and the 
classification of the five eukaryotic supergroups 

According to our above analysis about the phylogenetic 
distribution and the phylogeny of the two types of CLS in 
eukaryotes, the acquisition of CLS_cap pathway through 
mitochondrial endosymbiosis might have offered some 
potential for the evolution of multicellularity. Because 
the CLS_pld pathway exists exclusively in unicellular 
eukaryotes (protists), while the CLS_cap pathway is dis- 
tributed in all the multicellular organisms and only those 
unicellular eukaryotes (protists) that belong to the same 
supergroups or subgroup with these multicellular organ- 
isms. Therefore, for the first time, our work implies the 
endosymbiotic event of alpha proteobacteria not only led 
to the origin of mitochondria, but also might affect the 
subsequent evolution of eukaryotes such as the evolution 
of multicellularity, which may depend on what kinds of 
genes of the endosymbiont are transferred into the host 
nucleus and thus what kinds of endosymbiotic relation- 
ships are established. 

The classification and relationships of the five eukar- 
yotic supergroups are still under controversial now 
[54-57]. In the present work, it was showed the CL bio- 
synthesis and maturation pathways are very different 
between the two subgroups in Supergroup Chromalveo- 
lata-Stramenopiles possess the CLS_cap pathway and a 
complete maturation pathway, while Alveolata bear the 
CLS_pld pathway and not a complete maturation path- 
way (due to completely lacking the second step). There- 
fore, the classification putting these two subgroups into 
a common supergroup may be unreasonable. 

Amitochondriate protists were once thought as the 
most primitive extant eukaryotes because of their lack 
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of mitochondrion and other primitive characteristics 
[28,29,58]. However, recently, accumulating molecular 
evidence and the identification of atypical mitochon- 
dria- mitosome or hydrogenosome-in these organisms 
argued they might once possess mitochondria [59-62]. 
Our investigation indicates the absence of CL biosynth- 
esis and maturation pathways in these amitochondriate 
protists might be due to secondary losses. Thus, these 
atypical mitochondria in these amitochondriate protists 
might also result from degeneration of their once-exis- 
tent typical mitochondria. 

Conclusions 

We propose that the FECA inherited the CLS_pld path- 
way from its bacterial ancestor (which could be the bac- 
terial partner according to the 'fusion hypothesis' or the 
'phagotrophy hypothesis' or the 'endosymbiosis hypoth- 
esis' about the origin of eukaryotes from prokaryotes); 
later, when the FECA evolved into the last eukaryote 
common ancestor (LECA), the endosymbiotic mitochon- 
dria (alpha proteobacteria) brought in another pathway- 
CLS_cap pathway, and then in some LECA individuals 
the CLS_cap pathway substituted the previous CLS_pld 
pathway, and these LECA would evolve into the protist 
lineages from which multicellular eukaryotes could arise, 
while in the other LECA individuals the previous 
CLS_pld pathway was kept and the CLS_cap pathway 
was lost, and these LECA would evolve into the current 
protist lineages that possess the CLS_cap pathway. 
Besides, our work indicated CL maturation pathway 
arose after the emergence of eukaryotes probably 
through mechanisms such as the duplication of other 
already-existent genes, and gene duplication and loss 
occurred frequently at different lineage levels, increasing 
the diversity of the pathway probably so as to fit the 
complicated cellular process in various cells. On the 
other hand, our work implies what kind of the endo- 
symbiotic relationship is established during the evolu- 
tionary origin of mitochondrion in early eukaryotes 
might affect the subsequent evolution of multicellularity; 
the classification putting Stramenopiles and Alveolata 
together to form Chromalveolata may be unreasonable; 
the absence of CL synthesis and maturation pathways in 
amitochondriate protists is most probably due to sec- 
ondary degeneration. 

Methods 

Organisms 

The following organisms with genome or expressed 
sequence tag (EST) databases were taken as representa- 
tives of the five eukaryotic supergroups in this study: 1) 
Opisthokonta: Animalia (vertebrates: Homo sapien, Mus 
musculus, Xenopus laevis, Gallus gallus, Danio rerio; 
invertebrates: Drosophila melanogaster, Caenorhabditis 



elegans, Hydra magnipapillata, Strongylocentrotus purpur- 
atus, Schistosoma mansoni, Ciona intestinalis), Choanofla- 
gellate (Monosiga brevicollis), and Fungi (Ascomycota 
[Saccharomyces cerevisiae, Schizosaccharomyces pombe, 
Aspergillus fumigatus], Basidiomycota [Ustilago maydis, 
Cryptococcus neoformans], Microsporidia [Encephalitozoon 
cuniculi, E. intestinalis, Enterocytozoon bieneusi]), 2) 
Amoebozoa (Mycetozoa [Dictyostelium discoideum, 

D. purpureum], Entamoebida [Entamoeba histolytica, 

E. dispar, E. invadens]), 3) Archaeplastida (Planta [Arabi- 
dopsis thaliana, Oryza sativa], Chlorophyta [Chlamydo- 
monas reinhardtii, Ostreococcus lucimarinus, O. tauri, 
Micromonas sp. RCC299], Rhodophyta [Cyanidioschyzon 
merolae, Galdieria sulphuraria]); 4) Chromalveolata 
(Alveolata (Ciliata [Tetrahymena thermophila, Parame- 
cium tetraurelia], Perkinsida [Perkinsus marinus], Api- 
complexa [Plasmodium knowlesi, P. vivax, P. faciparum, 
P. chabaudi, P. yoelli yoelii, Cryptosporidium parvum, 
C. hominis, C. muris, Toxoplasma gondii, Babesia bovis, 
Theileria parva, T. annulata]), Stramenopiles (Blastocystis 
hominis, Oomycetes [Pythium ultimum BR144, P. sojae, 
P. ramorum, P. infestans, Saprolegnia parasitica CBS 22], 
Bacillariophyta [Thalassiosira pseudonana CCMP1335, 
Phaeodactylum tricornutum CCAP1055/1], Phaeophyceae 
[Ectocarpus siliculosus])), 5) Excavata (Heterolobosea 
[Naegleria gruberi], Kinteoplastids [Leishmania brazilien- 
sis, L. infantum, L. major, Trypanosoma bruzi, T. cruzi], 
Diplomonadida [Giardia lamblia str. WB], Parabasalia 
[Trichomonas vaginalis]) (Additional file 12: Table S3). 
Their genome or EST databases were downloaded. In 
addition, other eukaryotes and various prokaryotes were 
also included in this study when BLASTp searching 
against the Refseq protein database (Release 44, January, 
2011) ofNCBI database. 

CL biosynthesis and maturation pathway gene collection 
and identification 

All the reviewed eukaryotic CLS sequences (Q07560, 
001916, Q8MZC4, Q9UJA2, Q80ZM8, Q5U2V5, and 
B6TPV7) and bacterial CLS sequences (127 sequences, 
their accession ID and sequences can be obtained from 
the authors upon request), and reviewed TAZ sequences 
(Q9V6G5, Q16635, Q6IV77, Q06510, Q6IV84, Q6IV76, 
Q6IV83, Q6IV82, Q6IV78, and Q54DX7) were down- 
loaded from Uniprot. As only a few reviewed CLD1, PLA2 
and ALCAT are available in Uniprot, the curated ortho- 
logs of CLD1 (K13535) and ALCAT (K13513) were down- 
loaded from KEGG database; As for iPLA 2 beta (CG6718) 
and gamma (Q9NP80), their putative orthologs (beta: 15 
sequences; gamma: 14 sequences. Their accession ID and 
sequences can be obtained from the authors upon 
request.) were retrieved from KEGG SSDB database (hits 
with best-best relationship and identity > 0.5). These 
obtained sequences were aligned by MUSCLE, v 3.8.31 
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[63]. Then, HMM profile of each enzyme was build and 
calibrated from their multiple aligned sequences by 
HMMER package (v3.0) with default parameters. Finally, 
the obtained profiles were used as queries to search 
against genome databases of those organisms mentioned 
above and ResSeq protein databases by using hmmsearch. 
The obtained similar sequences with high E-value were 
further analyzed by PFAM to confirm whether they are 
really homologs. To exclude repeat "ANK" domain of 
PLA 2 beta (CG6718 and its orthologs), corresponding N- 
terminal region were removed according to the annotation 
of PFAM database before hmm profiles building. If no 
similar sequence was detected for a certain species, then 
its non-redundant (nr) protein and nucleotide database 
and genome database online were searched against by 
using BLASTP or tBLASTn program independently. The 
EST database of G. sulphuraria was searched against by 
using tBLASTn program. 

Bacterial similar sequences of each of these enzymes 
were also collected during searching against RefSeqpro- 
tein database. As many bacterial similar sequences were 
found under the cutoff E-value 0.001, they were collected 
as many as possible at first and then only a subset of 
them, determined by using preliminary phylogeny analyses 
were kept for the further analyses. 

Phylogenetic analyses 

In order to infer the origin of eukaryotic CL biosynthesis 
and maturation enzymes, all the sequences obtained above 
were used for the following phylogenetic analyses. 

Multiple alignment of each dataset was initially carried 
out using MUSCLE, version 3.8.31 [63]. Nonhomolo- 
gous insertions and sequence characters that could not 
be aligned with confidence were removed manually. 
Only unambiguously aligned sites were used for phylo- 
genetic analyses. 

Phylogenetic trees were inferred using maximum likeli- 
hood (ML) and Bayesian methods. ML trees were 
inferred with FastTree 2.1 [64] using default CAT model 
and other settings. MrBayes 3.1.2 [65] was used to per- 
form parallel Bayesian analyses with four incrementally 
heated Markov chains, sampled every 1,000 generations 
with the temperature set to 0.5. Among-site substitution 
rate heterogeneity was corrected with an invariable and 
eight Tdistributed substitution rate categories and the 
WAG model for amino acid substitutions [66], abbre- 
viated herein as WAG+I+8 G. Two separate runs were 
performed to confirm the convergence of the chains. The 
average standard deviation of split frequencies and the 
potential scale reduction factor convergence diagnostic 
were used to assess the convergence of the 2 runs. Trees 
below the observed stationarity level were discarded, 
resulting in a 'burnin' that comprised 25% of the poster- 
ior distribution of trees. The 50% majority-rule consensus 



tree was determined to calculate the posterior probabil- 
ities for each node. 

Prior to the above phylogenetic analyses, usually the 
large data sets including much more bacterial similar 
sequences were applied for preliminary analysis by using 
FastTree 2.1 with default parameters, and then only the 
sub-datasets including eukaryotic sequence data and the 
closest relationship with eukaryotes on the preliminary 
trees were picked out and subjected to the further 
analysis. 

Tree topology tests 

To assess the significance of gene duplication in each of 
the maturation pathway enzymes, alternative trees con- 
straining two or more separate subclades of a certain 
lineage as a monphyly were obtained by 20 searches 
using RAxML [67] with the models mentioned above. 
The best-scoring ML tree from each constraint tree 
search was then compared with the Bayesian tree. Site 
likelihoods were calculated in RAxML (-f g option) 
under the GTRGAMMA model of sequence evolution. 
The Approximately Unbiased (AU) test was performed 
using CONSEL 0.1 k [68]. 

Additional material 

' ■ 

Additional file 1: Additional file 1. Identified homologs involved in 
CLS synthesis and maturation pathways in eukaryotes. 

Additional file 2: Figure SI. The alignment of CLS_cap of eukaryotes 
(part). Conserved six membrane-binding regions are designated as l-VI 
and conserved amino acid residues among CAP family are boxed. Amino 
acid positions are numbered relative to the Monosiga brevicollis ortholog. 
# below the alignment indicates the amino acid residues that are specific 
for CL synthases. 

Additional file 3: Figure S2. The identified conserved motifs (the boxed 
regions) of CLS_pld from mitochondriate protists. Amino acid positions 
are numbered relative to the Plasmodium knowlesi ortholog (gi: 
221058144). 

Additional file 4: Figure S3. The identified conserved motifs of CLD of 
eukaryotes. Two conserved regions that might function as lipase and 
acyltransferase motifs are boxed. Amino acid positions are numbered 
relative to the Phytophthora ramorum ortholog (id: Pr_95977T0). 

Additional file 5: Figure S4. The identified conserved motifs of iPLA 2 of 
eukaryotes. Two conserved segments among iPLA2 are indicated by lines 
marked on the head. Conserved Ser and Asp residues that form a 
catalytic dyad, and the Gly-Gly dipeptide of the oxyanion hole are 
indicated by asterisks. 

Additional file 6: Figure S5. The ML phylogenetic tree of all the 
CLS_cap from eukaryotes and bacteria, and PGPS homologs of bacteria, 
which is corresponding to the Bayesian tree of Figure 2. 

Additional file 7: Figure S6. Phylogeny of eukaryotic homologs of CLD 
and bacterial similar sequences. The tree was constructed by using 
MrBayes 3.1.2, and is illustrated using the same conventions as Figure 1. 
The monophyly constraint of Fungi (Fungil+Fungi2) passed the AU test, 
suggesting they might be obtained through lineage-specific gene 
duplication. 

Additional file 8: Figure S7. Phylogeny of iPLA 2 and related bacterial 
similar sequences. The tree was constructed by using MrBayes 3.1.2, and 
is illustrated using the same conventions as Figure 1. The rejection of 
monophyly hypothesis of Animalia (Animalial +Animalia2) by AU test 
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(0.048) argues that iPLA 2 beta and gamma diverged in the ancestor of 
Animalia though it's hard to determine the time. 

Additional file 9: Figure S8. Phylogeny of ALCAT and AGPAT 3/4. The 
tree was constructed by using MrBayes 3.1.2, and is illustrated using the 
same conventions as Figure 1. AGPAT 3/4 were rooted as outgroup 
based on our preliminary analyses. The tree is illustrated using the same 
conventions as in Figure 1. Alternative trees constraining all 
Stramenopiles as monophyly were rejected, suggesting gene duplication 
occurred in the ancestor of Stramenopiles. 

Additional file 10: Figure S9. Phylogeny of the TAZ and bacterial 
similar sequences. The tree was constructed by using MrBayes 3.1.2, and 
is illustrated using the same conventions as Figure 1. Hypothetical trees 
constraining all Archaeplastids as monophyly were rejected, suggesting 
gene duplication occurred in the ancestor of Archaeplstids. 

Additional file 11: Additional file S1 1 . Comparision between 
Bayesian tree and alternative topologies 

Additional file 12: Additional file SI 2. The download sites of 
eukaryotic genomes or EST database included in the analyses 
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